{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "biological-register",
   "metadata": {},
   "source": [
    "# 应用单节点数据缓存\n",
    "作者：MindSpore团队、[陈超然](https://gitee.com/sunny_ccr)  \n",
    "`Linux`  `Ascend` `GPU` `CPU` `数据准备` `中级` `高级`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "considerable-stack",
   "metadata": {},
   "source": [
    "## 概述\n",
    "对于需要重复访问远程的数据集或需要重复从磁盘中读取数据集的情况，可以使用单节点缓存算子将数据集缓存于本地内存中，以加速数据集的读取。\n",
    "\n",
    "下面，本教程将演示如何使用单节点缓存服务来缓存经过数据增强处理的数据。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "closing-birthday",
   "metadata": {},
   "source": [
    "## 配置环境\n",
    "使用缓存服务前，需要安装MindSpore，并设置相关环境变量。以Conda环境为例，需要完成`LD_LIBRARY_PATH`与`PATH`环境变量的配置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "pretty-johnson",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[WARNING] ME(4968:139868289333056,MainProcess):2021-02-25-22:00:42.129.964 [mindspore/ops/operations/array_ops.py:2302] WARN_DEPRECATED: The usage of Pack is deprecated. Please use Stack.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/usr/local/cuda/bin:/home/sunny/miniconda3/envs/seb/bin:/home/sunny/miniconda3/condabin:/usr/local/cuda-10.1/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin\n",
      "/home/sunny/miniconda3/envs/seb/lib/python3.7/site-packages/mindspore:/home/sunny/miniconda3/envs/seb/lib/python3.7/site-packages/mindspore/lib\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import sys\n",
    "import mindspore\n",
    "\n",
    "python_path = \"/\".join(sys.executable.split(\"/\")[:-1])\n",
    "mindspore_path = \"/\".join(mindspore.__file__.split(\"/\")[:-1])\n",
    "mindspore_lib_path = os.path.join(mindspore_path, \"lib\")\n",
    "\n",
    "if 'PATH' not in os.environ:\n",
    "    os.environ['PATH'] = python_path\n",
    "elif python_path not in os.environ['PATH']:\n",
    "    os.environ['PATH'] += \":\" + python_path\n",
    "print(os.environ['PATH'])\n",
    "\n",
    "os.environ['LD_LIBRARY_PATH'] = \"{}:{}\".format(mindspore_path, mindspore_lib_path)\n",
    "print(os.environ['LD_LIBRARY_PATH'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sufficient-christopher",
   "metadata": {},
   "source": [
    "## 启动缓存服务器\n",
    "在使用单节点缓存服务之前，首先需要启动缓存服务器："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "known-webster",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache server startup completed successfully!\n",
      "The cache server daemon has been created as process id 5005 and listening on port 50052\n",
      "\n",
      "Recommendation:\n",
      "Since the server is detached into its own daemon process, monitor the server logs (under /tmp/mindspore/cache/log) for any issues that may happen after startup\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!cache_admin --start"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "sized-invitation",
   "metadata": {},
   "source": [
    "若提示找不到`libpython3.7m.so.1.0`文件，尝试在虚拟环境下查找其路径并设置环境变量："
   ]
  },
  {
   "cell_type": "raw",
   "id": "convenient-diameter",
   "metadata": {},
   "source": [
    "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:{path_to_conda}/envs/{your_env_name}/lib"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "stone-ceiling",
   "metadata": {},
   "source": [
    "## 创建缓存会话\n",
    "若缓存服务器中不存在缓存会话，则需要创建一个缓存会话，得到缓存会话id："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "convinced-dinner",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Session created for server on port 50052: 4173327275\n"
     ]
    }
   ],
   "source": [
    "!cache_admin -g"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "unexpected-addiction",
   "metadata": {},
   "source": [
    "缓存会话id由服务器随机分配。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "linear-slovenia",
   "metadata": {},
   "source": [
    "## 创建缓存实例\n",
    "在脚本中使用`DatasetCache` API来定义一个名为`some_cache`的缓存实例，并把上一步中创建的缓存会话id传入`session_id`参数："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "appreciated-tonight",
   "metadata": {},
   "outputs": [],
   "source": [
    "import mindspore.dataset as ds\n",
    "\n",
    "some_cache = ds.DatasetCache(session_id=4173327275, size=0, spilling=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "chronic-graphic",
   "metadata": {},
   "source": [
    "## 插入缓存实例\n",
    "下面样例中使用到CIFAR-10数据集。运行样例前，需要参照数据集加载中的方法下载并存放CIFAR-10数据集。目录结构如下："
   ]
  },
  {
   "cell_type": "raw",
   "id": "incorporate-wrestling",
   "metadata": {},
   "source": [
    "├─my_training_script.py\n",
    "└─cifar-10-batches-bin\n",
    "    ├── batches.meta.txt\n",
    "    ├── data_batch_1.bin\n",
    "    ├── data_batch_2.bin\n",
    "    ├── data_batch_3.bin\n",
    "    ├── data_batch_4.bin\n",
    "    ├── data_batch_5.bin\n",
    "    ├── readme.html\n",
    "    └── test_batch.bin"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bearing-humidity",
   "metadata": {},
   "source": [
    "在应用数据增强算子时将所创建的`some_cache`作为其`cache`参数传入："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "separated-closure",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 image shape: (32, 32, 3)\n",
      "1 image shape: (32, 32, 3)\n",
      "2 image shape: (32, 32, 3)\n",
      "3 image shape: (32, 32, 3)\n",
      "4 image shape: (32, 32, 3)\n"
     ]
    }
   ],
   "source": [
    "import mindspore.dataset.vision.c_transforms as c_vision\n",
    "\n",
    "dataset_dir = \"cifar-10-batches-bin/\"\n",
    "data = ds.Cifar10Dataset(dataset_dir=dataset_dir, num_samples=5, shuffle=False, num_parallel_workers=1)\n",
    "\n",
    "# apply cache to map\n",
    "rescale_op = c_vision.Rescale(1.0 / 255.0, -1.0)\n",
    "data = data.map(input_columns=[\"image\"], operations=rescale_op, cache=some_cache)\n",
    "\n",
    "num_iter = 0\n",
    "for item in data.create_dict_iterator(num_epochs=1):  # each data is a dictionary\n",
    "    # in this example, each dictionary has a key \"image\"\n",
    "    print(\"{} image shape: {}\".format(num_iter, item[\"image\"].shape))\n",
    "    num_iter += 1"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "multiple-bubble",
   "metadata": {},
   "source": [
    "通过cache_admin --list_sessions命令可以查看当前会话有五条数据，说明数据缓存成功。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "shared-capture",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Listing sessions for server on port 50052\n",
      "\n",
      "     Session    Cache Id  Mem cached Disk cached  Avg cache size  Numa hit\n",
      "  4173327275   575278224           5         n/a           12442         5\n"
     ]
    }
   ],
   "source": [
    "!cache_admin --list_sessions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "grand-active",
   "metadata": {},
   "source": [
    "## 销毁缓存会话\n",
    "在训练结束后，可以选择将当前的缓存销毁并释放内存："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "happy-three",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Drop session successfully for server on port 50052\n"
     ]
    }
   ],
   "source": [
    "!cache_admin --destroy_session 4173327275"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "contemporary-climb",
   "metadata": {},
   "source": [
    "以上命令将销毁缓存会话id为4173327275的缓存。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "metric-antibody",
   "metadata": {},
   "source": [
    "## 关闭缓存服务器\n",
    "使用完毕后，可以选择关闭缓存服务器，该操作将销毁当前服务器中存在的所有缓存会话并释放内存。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "hazardous-lawrence",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Cache server on port 50052 has been stopped successfully.\n"
     ]
    }
   ],
   "source": [
    "!cache_admin --stop"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
